Goto

Collaborating Authors

 meta sampler




Supplementary Materials for Balanced Meta-Softmax for Long-T ailed Visual Recognition

Neural Information Processing Systems

A careful implementation should be made for instance segmentation tasks. Firstly, we define f as, f ( x ) : = l (θ ) + t (24) where l ( θ ) and t is previously defined in the main paper.



Q1: Explanation about the mismatch (1/4 and 1) between the theory (Theorem 2 and Corollary 2.1) and practice

Neural Information Processing Systems

We will answer the major points below and address all remaining ones in the final version. We leave further discussions on the convergence rate to future works. Eqn.11 in [B] is generic (a superset of most loss engineerings like [3, 29, A]), it uses bi-level We will add a discussion on [3, A, B] in the final version. Q2: Meta sampler has a similar idea to [12,24,27]. CIFAR10-L T); theirs are instance-based and ours is class-based (fewer parameters and simpler optimization landscape).


Review for NeurIPS paper: Balanced Meta-Softmax for Long-Tailed Visual Recognition

Neural Information Processing Systems

Weaknesses: -The equations (3) and (4) are, however, very similar to [3] and [A, B] in the way that they force the minor-class examples to have larger decision values (i.e., \exp \eta_j) in training. The proposed softmax seems particularly similar to eq. (11) in [B]. The authors should have cited these papers and provided further discussion and comparison. This point limits the novelty/significance of the paper. It is hard for me to judge the novelty of the proposed meta sampler.


Review for NeurIPS paper: Balanced Meta-Softmax for Long-Tailed Visual Recognition

Neural Information Processing Systems

The paper first shows that the softmax gives a biased gradient estimation under the long-tailed setup, and proposes a balanced softmax to accommodate the label distribution shift between training and testing. Theoretically, the authors derive the generalization bound for multiclass softmax regression. They then introduce a balanced meta-softmax procedure, using a complementary meta sampler to estimate the optimal class sample rate and further improve long-tailed learning.Experiments demonstrate that this outperforms SOTA long-tailed classification solutions on both visual recognition and instance segmentation tasks. The paper was reviewed by the four reviewers that found strengths and weaknesses. The strengths were the fact that the idea is intuitive and simple to implement, the theoretical derivations in support of the method, and the good results.


Balanced Meta-Softmax for Long-Tailed Visual Recognition

Ren, Jiawei, Yu, Cunjun, Sheng, Shunan, Ma, Xiao, Zhao, Haiyu, Yi, Shuai, Li, Hongsheng

arXiv.org Machine Learning

Deep classifiers have achieved great success in visual recognition. However, realworld data is long-tailed by nature, leading to the mismatch between training and testing distributions. In this paper, we show that the Softmax function, though used in most classification tasks, gives a biased gradient estimation under the long-tailed setup. This paper presents Balanced Softmax, an elegant unbiased extension of Softmax, to accommodate the label distribution shift between training and testing. Theoretically, we derive the generalization bound for multiclass Softmax regression and show our loss minimizes the bound. In addition, we introduce Balanced Meta-Softmax, applying a complementary Meta Sampler to estimate the optimal class sample rate and further improve long-tailed learning. In our experiments, we demonstrate that Balanced Meta-Softmax outperforms state-of-the-art long-tailed classification solutions on both visual recognition and instance segmentation tasks.


Meta-Learning for Stochastic Gradient MCMC

Gong, Wenbo, Li, Yingzhen, Hernández-Lobato, José Miguel

arXiv.org Machine Learning

Stochastic gradient Markov chain Monte Carlo (SG-MCMC) has become increasingly popular for simulating posterior samples in large-scale Bayesian modeling. However, existing SG-MCMC schemes are not tailored to any specific probabilistic model, even a simple modification of the underlying dynamical system requires significant physical intuition. This paper presents the first meta-learning algorithm that allows automated design for the underlying continuous dynamics of an SG-MCMC sampler. The learned sampler generalizes Hamiltonian dynamics with state-dependent drift and diffusion, enabling fast traversal and efficient exploration of neural network energy landscapes. Experiments validate the proposed approach on both Bayesian fully connected neural network and Bayesian recurrent neural network tasks, showing that the learned sampler out-performs generic, hand-designed SG-MCMC algorithms, and generalizes to different datasets and larger architectures.